Denoise algorithm

This notebook defines the denoise algorithm (step C defined in Towsey 2013) and compares the speed of different implementations. This is a step in processing recordings of the natural environment that "better preserves the structural integrity of complex acoustic events (e.g. bird calls) but removes noise from background locations further removed from that event (Towsey 2013)."

Towsey, Michael W. (2013) Noise removal from wave-forms and spectrograms derived from natural recordings of the environment.

Required packages

Import statements


In [1]:
import numpy as np
from scipy.ndimage import generic_filter
from numba import jit, guvectorize, float64
import pyprind
import matplotlib.pyplot as plt
%matplotlib inline

python

implementation using pure python


In [2]:
def denoise(a, b):
    for channel in range(2):
        for f_band in range(4, a.shape[1] - 4):
            for t_step in range(1, a.shape[2] - 1):
                neighborhood = a[channel, f_band - 4:f_band + 5, t_step - 1:t_step + 2]
                if neighborhood.mean() < 10:
                    b[channel, f_band, t_step] = neighborhood.min()
                else:
                    b[channel, f_band, t_step] = neighborhood[4, 1]
    return b

scipy

implementation using scipy.ndimage.generic_filter—the custom callback function is just-in-time compiled by numba


In [3]:
@jit(nopython=True)
def filter_denoise(neighborhood):
    if neighborhood.mean() < 10:
        return neighborhood.min()
    else:
        return neighborhood[13]

In [4]:
def denoise_scipy(a, b):
        for channel in range(2):
            b[channel] = generic_filter(input=a[channel], function=filter_denoise, 
                                        size=(9, 3), mode='constant')
        return b

numba

implementation of a universal function via numba.guvectorize


In [5]:
# just removed return statement
def denoise_guvectorize(a, b):
    for channel in range(2):
        for f_band in range(4, a.shape[1] - 4):
            for t_step in range(1, a.shape[2] - 1):
                neighborhood = a[channel, f_band - 4:f_band + 5, t_step - 1:t_step + 2]
                if neighborhood.mean() < 10:
                    b[channel, f_band, t_step] = neighborhood.min()
                else:
                    b[channel, f_band, t_step] = neighborhood[4, 1]

serial version


In [6]:
denoise_numba = guvectorize('float64[:,:,:], float64[:,:,:]', '(c,f,t)->(c,f,t)', 
                            nopython=True)(denoise_guvectorize)

parallel version


In [7]:
denoise_parallel = guvectorize('float64[:,:,:], float64[:,:,:]', '(c,f,t)->(c,f,t)', 
                               nopython=True, target='parallel')(denoise_guvectorize)

check results

test the implementations on a randomly generated dataset and verfiy that all the results are the same


In [8]:
size = 100
data = np.random.rand(2, size, int(size*1.5))
data[:, int(size/4):int(size/2), int(size/4):int(size/2)] = 27

In [9]:
result_python = denoise(data, np.zeros_like(data))
result_scipy = denoise_scipy(data, np.zeros_like(data))
result_numba = denoise_numba(data, np.zeros_like(data))
result_parallel = denoise_parallel(data, np.zeros_like(data))

check if the different implementations produce the same result


In [10]:
assert np.allclose(result_python, result_scipy) and np.allclose(result_python, result_numba)

plot results


In [11]:
fig, ax = plt.subplots(2,2)
fig.set_figheight(8)
fig.set_figwidth(12)
im1 = ax[0, 0].imshow(data[0], cmap='viridis', interpolation='none', vmax=1)
t1 = ax[0, 0].set_title('data')
im2 = ax[0, 1].imshow(result_python[0], cmap='viridis', interpolation='none', vmax=1)
t1 = ax[0, 1].set_title('pure python')
im3 = ax[1, 0].imshow(result_scipy[0], cmap='viridis', interpolation='none', vmax=1)
t1 = ax[1, 0].set_title('scipy')
im4 = ax[1, 1].imshow(result_numba[0], cmap='viridis', interpolation='none', vmax=1)
t1 = ax[1, 1].set_title('numba')


profile for different data sizes

time the different implementations on different dataset sizes


In [12]:
sizes = [30, 50, 100, 200, 400, 800, 1600]

In [13]:
progress_bar = pyprind.ProgBar(iterations=len(sizes), track_time=True, stream=1, monitor=True)
t_python = np.empty_like(sizes, dtype=np.float64)
t_scipy = np.empty_like(sizes, dtype=np.float64)
t_numba = np.empty_like(sizes, dtype=np.float64)
t_parallel = np.empty_like(sizes, dtype=np.float64)
for size in range(len(sizes)):
    progress_bar.update(item_id=sizes[size])
    data = np.random.rand(2, sizes[size], sizes[size])*0.75
    t_1 = %timeit -oq denoise(data, np.zeros_like(data))
    t_2 = %timeit -oq denoise_scipy(data, np.zeros_like(data))
    t_3 = %timeit -oq denoise_numba(data, np.zeros_like(data))
    t_4 = %timeit -oq denoise_parallel(data, np.zeros_like(data))
    t_python[size] = t_1.best
    t_scipy[size] = t_2.best
    t_numba[size] = t_3.best
    t_parallel[size] = t_4.best


0%   100%
[#######] | ETA: 00:00:00 | Item ID: 1600
Total time elapsed: 00:02:30

plot profile results


In [14]:
fig, ax = plt.subplots(figsize=(15,5))
p1 = ax.loglog(sizes, t_python, color='black', marker='.', label='python')
p2 = ax.loglog(sizes, t_scipy, color='blue', marker='.', label='scipy')
p3 = ax.loglog(sizes, t_numba, color='green', marker='.', label='numba')
p4 = ax.loglog(sizes, t_parallel, color='red', marker='.', label='parallel')
lx = ax.set_xlabel("data array size (2 x n x n elements)")
ly = ax.set_ylabel("time (seconds)")
t1 = ax.set_title("running times of the 'denoise' algorithm")
ax.grid(True, which='major')
l = ax.legend()


conclusion

use the serial numba implementation